专门的基于变形金刚的模型(例如生物Biobert和Biomegatron)适用于基于公共可用的生物医学语料库的生物医学领域。因此,它们有可能编码大规模的生物学知识。我们研究了这些模型中生物学知识的编码和表示,及其支持癌症精度医学推断的潜在实用性 - 即,对基因组改变的临床意义的解释。我们比较不同变压器基线的性能;我们使用探测来确定针对不同实体的编码的一致性;我们使用聚类方法来比较和对比基因,变异,药物和疾病的嵌入的内部特性。我们表明,这些模型确实确实编码了生物学知识,尽管其中一些模型在针对特定任务的微调中丢失了。最后,我们分析了模型在数据集中的偏见和失衡方面的行为。
translated by 谷歌翻译
为了解释神经NLI模型及其推理策略,我们进行了一个系统的探测研究,调查了这些模型是否捕获了自然逻辑的至关重要:单调性和概念包容性。在向下单调上下文中正确识别有效推论是NLI性能的已知绊脚石,包括否定范围和广义量子等语言现象。要了解这种困难,我们将单调性强调为上下文的属性,并检查模型在中文嵌入中捕获单调信息的程度,这些嵌入式是其决策过程的中间嵌入。绘制最近探测范式的进步,我们比较各种模型的单调性功能的存在。我们发现,单调信息在基准测试中实现高分的流行NLI模型的表现中,并观察到基于微调策略的这些模型的改进引入了更强大的单调性功能,以及他们在挑战集上的提高性能。
translated by 谷歌翻译
已经提出了在科学域中再生自然语言解释作为评估复杂的多跳和可解释的推理的基准。在这种情况下,当使用作为跨编码器架构并进行微调的解释时,大型语言模型可以实现最先进的性能。然而,虽然对解释的质量很多,但有效地研究了推理的问题在很大程度上。事实上,交叉编码器本质上不是可扩展的,对需要推断的大规模事实库的实际情况具有有限的适用性。为了在规模上实现复杂的多跳推理,本文重点介绍了双编码器架构,调查了密集和稀疏模型交叉口的科学解释再生问题。具体地,我们呈现瘢痕(用于可扩展的自回归推断),一种混合​​框架,其迭代地结合了基于变压器的双编码器,其具有稀疏模型的解释性模型,旨在利用说明中的显式推理模式。我们的实验表明,混合框架显着优于先前的稀疏模型,实现了与最先进的交叉编码器相当的性能,同时大约为数百万个事实的Corpora的速度快50倍和可扩展。进一步分析了语义漂移和多跳问题的回答,揭示了所提出的杂交提高了最具挑战性解释的质量,有助于提高下游推理任务的性能。
translated by 谷歌翻译
本文介绍了DIFF解释器,这是可解释的多跳推断的第一个混合框架,该框架通过可区分的凸优化将明确的约束与神经体系结构集成在一起。具体而言,DIFF解释器允许在受限的优化框架内微调神经表示,以回答和解释自然语言的多跳问题。为了证明混合框架的功效,我们将现有的基于ILP的求解器与基于变压器的表示相结合。对科学和常识性质量检查任务的广泛经验评估表明,在端到端可区分框架中明确约束的整合可以显着改善非不同可差异ILP求解器的性能(8.91%-13.3%)。此外,其他分析表明,与独立的变压器和以前的多跳方法相比,DIFF解释器能够实现强大的性能,同时仍提供结构化解释以支持其预测。
translated by 谷歌翻译
We are witnessing a widespread adoption of artificial intelligence in healthcare. However, most of the advancements in deep learning (DL) in this area consider only unimodal data, neglecting other modalities. Their multimodal interpretation necessary for supporting diagnosis, prognosis and treatment decisions. In this work we present a deep architecture, explainable by design, which jointly learns modality reconstructions and sample classifications using tabular and imaging data. The explanation of the decision taken is computed by applying a latent shift that, simulates a counterfactual prediction revealing the features of each modality that contribute the most to the decision and a quantitative score indicating the modality importance. We validate our approach in the context of COVID-19 pandemic using the AIforCOVID dataset, which contains multimodal data for the early identification of patients at risk of severe outcome. The results show that the proposed method provides meaningful explanations without degrading the classification performance.
translated by 谷歌翻译
Multivariate Hawkes processes are temporal point processes extensively applied to model event data with dependence on past occurrences and interaction phenomena. In the generalised nonlinear model, positive and negative interactions between the components of the process are allowed, therefore accounting for so-called excitation and inhibition effects. In the nonparametric setting, learning the temporal dependence structure of Hawkes processes is often a computationally expensive task, all the more with Bayesian estimation methods. In general, the posterior distribution in the nonlinear Hawkes model is non-conjugate and doubly intractable. Moreover, existing Monte-Carlo Markov Chain methods are often slow and not scalable to high-dimensional processes in practice. Recently, efficient algorithms targeting a mean-field variational approximation of the posterior distribution have been proposed. In this work, we unify existing variational Bayes inference approaches under a general framework, that we theoretically analyse under easily verifiable conditions on the prior, the variational class, and the model. We notably apply our theory to a novel spike-and-slab variational class, that can induce sparsity through the connectivity graph parameter of the multivariate Hawkes model. Then, in the context of the popular sigmoid Hawkes model, we leverage existing data augmentation technique and design adaptive and sparsity-inducing mean-field variational methods. In particular, we propose a two-step algorithm based on a thresholding heuristic to select the graph parameter. Through an extensive set of numerical simulations, we demonstrate that our approach enjoys several benefits: it is computationally efficient, can reduce the dimensionality of the problem by selecting the graph parameter, and is able to adapt to the smoothness of the underlying parameter.
translated by 谷歌翻译
The usage of deep neural networks in safety-critical systems is limited by our ability to guarantee their correct behavior. Runtime monitors are components aiming to identify unsafe predictions and discard them before they can lead to catastrophic consequences. Several recent works on runtime monitoring have focused on out-of-distribution (OOD) detection, i.e., identifying inputs that are different from the training data. In this work, we argue that OOD detection is not a well-suited framework to design efficient runtime monitors and that it is more relevant to evaluate monitors based on their ability to discard incorrect predictions. We call this setting out-ofmodel-scope detection and discuss the conceptual differences with OOD. We also conduct extensive experiments on popular datasets from the literature to show that studying monitors in the OOD setting can be misleading: 1. very good OOD results can give a false impression of safety, 2. comparison under the OOD setting does not allow identifying the best monitor to detect errors. Finally, we also show that removing erroneous training data samples helps to train better monitors.
translated by 谷歌翻译
There is an increasing need in our society to achieve faster advances in Science to tackle urgent problems, such as climate changes, environmental hazards, sustainable energy systems, pandemics, among others. In certain domains like chemistry, scientific discovery carries the extra burden of assessing risks of the proposed novel solutions before moving to the experimental stage. Despite several recent advances in Machine Learning and AI to address some of these challenges, there is still a gap in technologies to support end-to-end discovery applications, integrating the myriad of available technologies into a coherent, orchestrated, yet flexible discovery process. Such applications need to handle complex knowledge management at scale, enabling knowledge consumption and production in a timely and efficient way for subject matter experts (SMEs). Furthermore, the discovery of novel functional materials strongly relies on the development of exploration strategies in the chemical space. For instance, generative models have gained attention within the scientific community due to their ability to generate enormous volumes of novel molecules across material domains. These models exhibit extreme creativity that often translates in low viability of the generated candidates. In this work, we propose a workbench framework that aims at enabling the human-AI co-creation to reduce the time until the first discovery and the opportunity costs involved. This framework relies on a knowledge base with domain and process knowledge, and user-interaction components to acquire knowledge and advise the SMEs. Currently,the framework supports four main activities: generative modeling, dataset triage, molecule adjudication, and risk assessment.
translated by 谷歌翻译
The goal of autonomous vehicles is to navigate public roads safely and comfortably. To enforce safety, traditional planning approaches rely on handcrafted rules to generate trajectories. Machine learning-based systems, on the other hand, scale with data and are able to learn more complex behaviors. However, they often ignore that agents and self-driving vehicle trajectory distributions can be leveraged to improve safety. In this paper, we propose modeling a distribution over multiple future trajectories for both the self-driving vehicle and other road agents, using a unified neural network architecture for prediction and planning. During inference, we select the planning trajectory that minimizes a cost taking into account safety and the predicted probabilities. Our approach does not depend on any rule-based planners for trajectory generation or optimization, improves with more training data and is simple to implement. We extensively evaluate our method through a realistic simulator and show that the predicted trajectory distribution corresponds to different driving profiles. We also successfully deploy it on a self-driving vehicle on urban public roads, confirming that it drives safely without compromising comfort. The code for training and testing our model on a public prediction dataset and the video of the road test are available at https://woven.mobi/safepathnet
translated by 谷歌翻译
统计监督的学习框架假设了一个输入输出集,其联合概率分布可可靠地由培训数据集表示。然后,要求学习者从培训数据集的输入输出对中输出从培训数据集的输入规则。在这项工作中,我们在机器学习的背景下,我们提供了对渐近式式属性属性(AEP)\ citep {Shannon:1948}的有意义的见解,并阐明了其一些潜在的后果,以实现几次学习。我们为信息理论AEP下的可靠学习提供了理论保证,以及相对于样本量的概括错误。然后,我们专注于高效的复发性神经网(RNN)框架,并提出了用于几次学习的降低渗透算法。我们还提出了RNN的数学直觉,作为稀疏编码求解器的近似值。我们通过图像脱张和光学相干断层扫描(OCT)示例验证所提出方法的适用性,鲁棒性和计算效率。我们的实验结果表明,改善学习模型的样本效率,概括和时间复杂性的显着潜力,因此可以利用实时应用。
translated by 谷歌翻译